Acoustic observation context modeling in segment based speech recognition
نویسندگان
چکیده
This paper describes a novel method that models the correlation between acoustic observations in contiguous speech segments. The basic idea behind the method is that acoustic observations are conditioned not only on the phonetic context but also on the preceding acoustic segment observation. The correlation between consecutive acoustic observations is modeled by polynomial mean trajectory segment models. This method is an extension of conventional segment modeling approaches in that it not only describes the correlation of acoustic observations inside segments but also between contiguous segments. It is also a generalization of phonetic context (e.g.,triphone) modeling approaches because it can model acoustic context and phonetic context at the same time. In a speaker-independent phoneme classi cation test, using the proposed method resulted in a 7{9% reduction in error rate as compared to the traditional triphone segmental model system and a 31% reduction as compared to a similar triphone HMM (hidden Markov model) system.
منابع مشابه
Allophone-based acoustic modeling for Persian phoneme recognition
Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...
متن کاملAcoustic Modeling Improvements in a Segment-based Speech Recognizer
In this paper we report on some recent improvements on the acoustic modeling in a segment-based speech recognition system. Context-dependent segment models and improved pronunciation modeling are shown to reduce word error rates in a telephone-based, conversational system by over 18%, while the technique of Gaussian selection reduces overall computation by more than a factor of two.
متن کاملPersian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods
Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...
متن کاملUnifying HMM and phone-pair segment models
It is well known that HMM is ineffective in modeling the dynamics of speech due to the piecewise stationary and the independent observation assumptions. In this paper, we propose an analytically tractable framework in which the two modeling techniques are combined to reach a jointly optimal decision in both training and recognition. The combination is achieved by coupling the hidden processes f...
متن کاملImproved Bayesian Training for Context-Dependent Modeling in Continuous Persian Speech Recognition
Context-dependent modeling is a widely used technique for better phone modeling in continuous speech recognition. While different types of context-dependent models have been used, triphones have been known as the most effective ones. In this paper, a Maximum a Posteriori (MAP) estimation approach has been used to estimate the parameters of the untied triphone model set used in data-driven clust...
متن کامل